54 research outputs found

    Racing to hardware-validated simulation

    Get PDF
    Processor simulators rely on detailed timing models of the processor pipeline to evaluate performance. The diversity in real-world processor designs mandates building flexible simulators that expose parts of the underlying model to the user in the form of configurable parameters. Consequently, the accuracy of modeling a real processor relies on both the accuracy of the pipeline model itself, and the accuracy of adjusting the configuration parameters according to the modeled processor. Unfortunately, processor vendors publicly disclose only a subset of their design decisions, raising the probability of introducing specification inaccuracies when modeling these processors. Inaccurately tuning model parameters deviates the simulated processor from the actual one. In the worst case, using improper parameters may lead to imbalanced pipeline models compromising the simulation output. Therefore, simulation models should be hardware-validated before using them for performance evaluation. As processors increase in complexity and diversity, validating a simulator model against real hardware becomes increasingly more challenging and time-consuming. In this work, we propose a methodology for validating simulation models against real hardware. We create a framework that relies on micro-benchmarks to collect performance statistics on real hardware, and machine learning-based algorithms to fine-tune the unknown parameters based on the accumulated statistics. We overhaul the Sniper simulator to support the ARM AArch64 instruction-set architecture (ISA), and introduce two new timing models for ARM-based in-order and out-of-order cores. Using our proposed simulator validation framework, we tune the in-order and out-of-order models to match the performance of a real-world implementation of the Cortex-A53 and Cortex-A72 cores with an average error of 7% and 15%, respectively, across a set of SPEC CPU2017 benchmarks

    Design and implementation of an architecture-aware hardware runtime for heterogeneous systems

    Get PDF
    Parallel computing has become the norm to gain performance in multicore and heterogeneous systems. Many programming models allow to exploit this parallelism with easy to use tools. In this work we focus on task-based programming models. The parallelism is expressed with pieces of work called tasks that have data dependencies among them, and therefore have to be executed in a certain order. However, tasks that don’t depend on any other running task can be executed in parallel

    Enabling HW-based task scheduling in large multicore architectures

    Get PDF
    Dynamic Task Scheduling is an enticing programming model aiming to ease the development of parallel programs with intrinsically irregular or data-dependent parallelism. The performance of such solutions relies on the ability of the Task Scheduling HW/SW stack to efficiently evaluate dependencies at runtime and schedule work to available cores. Traditional SW-only systems implicate scheduling overheads of around 30K processor cycles per task, which severely limit the ( core count , task granularity ) combinations that they might adequately handle. Previous work on HW-accelerated Task Scheduling has shown that such systems might support high performance scheduling on processors with up to eight cores, but questions remained regarding the viability of such solutions to support the greater number of cores now frequently found in high-end SMP systems. The present work presents an FPGA-proven, tightly-integrated, Linux-capable, 30-core RISC-V system with hardware accelerated Task Scheduling. We use this implementation to show that HW Task Scheduling can still offer competitive performance at such high core count, and describe how this organization includes hardware and software optimizations that make it even more scalable than previous solutions. Finally, we outline ways in which this architecture could be augmented to overcome inter-core communication bottlenecks, mitigating the cache-degradation effects usually involved in the parallelization of highly optimized serial code.This work is supported by the TEXTAROSSA project G.A. n.956831, as part of the EuroHPC initiative, by the Spanish Government (grants PCI2021-121964, TEXTAROSSA; PDC2022-133323-I00, Multi-Ka; PID2019-107255GB-C21 MCIN/AEI/10.13039/501100011033; and CEX2021-001148-S), by Generalitat de Catalunya (2021 SGR 01007), and FAPESP (grant 2019/26702-8).Peer ReviewedPostprint (published version

    Towards reconfigurable accelerators in HPC: Designing a multipurpose eFPGA tile for heterogeneous SoCs

    Get PDF
    The goal of modern high performance computing platforms is to combine low power consumption and high throughput. Within the European Processor Initiative (EPI), such an SoC platform to meet the novel exascale requirements is built and investigated. As part of this project, we introduce an embedded Field Programmable Gate Array (eFPGA), adding flexibility to accelerate various workloads. In this article, we show our approach to design the eFPGA tile that supports the EPI SoC. While eFPGAs are inherently reconfigurable, their initial design has to be determined for tape-out. The design space of the eFPGA is explored and evaluated with different configurations of two HPC workloads, covering control and dataflow heavy applications. As a result, we present a well-balanced eFPGA design that can host several use cases and potential future ones by allocating 1% of the total EPI SoC area. Finally, our simulation results of the architectures on the eFPGA show great performance improvements over their software counterparts.European Processor Initiative (EPI) project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 826647, from Spanish Government (PID2019- 107255GB-C21/AEI /10.13039/501100011033), and from Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017-SGR-1328). M. Moreto is partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship No. RYC-2016-21104.Peer ReviewedPostprint (author's final draft

    OmpSs@cloudFPGA: An FPGA task-based programming model with message passing

    Get PDF
    Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs. Field-Programmable Gate Arrays (FPGA) are one of those types of accelerators that are becoming widely available in data centers. This paper proposes OmpSs@cloudFPGA, which includes novel extensions to parallel task-based programming models that enable easy and efficient programming of heterogeneous clusters with FPGAs. The programmer only needs to annotate, with OpenMP-like pragmas, the tasks of the application that should be accelerated in the cluster of FPGAs. Next, the proposed programming model framework automatically extracts parts annotated with High-Level Synthesis (HLS) pragmas and synthesizes them into hardware accelerator cores for FPGAs. Additionally, our extensions include and support two novel features: 1) FPGA-to-FPGA direct communication using a Message Passing Interface (MPI) similar Application Programming Interface (API) with one-to-one and collective communications to alleviate host communication channel bottleneck, and 2) creating and spawning work from inside the FPGAs to their own accelerator cores based on an MPI rank-like identification. These features break the classical host-accelerator model, where the host (typically the CPU) generates all the work and distributes it to each accelerator. We also present an evaluation of OmpSs@cloudFPGA for different parallel strategies of the N-Body application on the IBM cloudFPGA research platform. Results show that for cluster sizes up to 56 FPGAs, the performance scales linearly. To the best of our knowledge, this is the best performance obtained for N-body over FPGA platforms, reaching 344 Gpairs/s with 56 FPGAs. Finally, we compare the performance and power consumption of the proposed approach with the ones obtained by a classical execution on the MareNostrum 4 supercomputer, demonstrating that our FPGA approach reduces power consumption by an order of magnitude.This work has been done in the context of the IBM/BSC Deep Learning Center initiative. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA), from Spanish Government (PID2019-107255GBC21/AEI/10.13039/501100011033), and from Generalitat de Catalunya (2017-SGR-1414 and 2017-SGR-1328).Peer ReviewedPostprint (author's final draft

    A comprehensive survey on reinforcement-learning-based computation offloading techniques in Edge Computing Systems

    Get PDF
    Producción CientíficaIn recent years, the number of embedded computing devices connected to the Internet has exponentially increased. At the same time, new applications are becoming more complex and computationally demanding, which can be a problem for devices, especially when they are battery powered. In this context, the concepts of computation offloading and edge computing, which allow applications to be fully or partially offloaded and executed on servers close to the devices in the network, have arisen and received increasing attention. Then, the design of algorithms to make the decision of which applications or tasks should be offloaded, and where to execute them, is crucial. One of the options that has been gaining momentum lately is the use of Reinforcement Learning (RL) and, in particular, Deep Reinforcement Learning (DRL), which enables learning optimal or near-optimal offloading policies adapted to each particular scenario. Although the use of RL techniques to solve the computation offloading problem in edge systems has been covered by some surveys, it has been done in a limited way. For example, some surveys have analysed the use of RL to solve various networking problems, with computation offloading being one of them, but not the primary focus. Other surveys, on the other hand, have reviewed techniques to solve the computation offloading problem, being RL just one of the approaches considered. To the best of our knowledge, this is the first survey that specifically focuses on the use of RL and DRL techniques for computation offloading in edge computing system. We present a comprehensive and detailed survey, where we analyse and classify the research papers in terms of use cases, network and edge computing architectures, objectives, RL algorithms, decision-making approaches, and time-varying characteristics considered in the analysed scenarios. In particular, we include a series of tables to help researchers identify relevant papers based on specific features, and analyse which scenarios and techniques are most frequently considered in the literature. Finally, this survey identifies a number of research challenges, future directions and areas for further study.Consejería de Educación de la Junta de Castilla y León y FEDER (VA231P20)Ministerio de Ciencia e Innovación y Agencia Estatal de Investigación (Proyecto PID2020-112675RB-C42, PID2021-124463OBI00 y RED2018-102585-T, financiados por MCIN/AEI/10.13039/501100011033

    Role of age and comorbidities in mortality of patients with infective endocarditis

    Get PDF
    [Purpose]: The aim of this study was to analyse the characteristics of patients with IE in three groups of age and to assess the ability of age and the Charlson Comorbidity Index (CCI) to predict mortality. [Methods]: Prospective cohort study of all patients with IE included in the GAMES Spanish database between 2008 and 2015.Patients were stratified into three age groups:<65 years,65 to 80 years,and ≥ 80 years.The area under the receiver-operating characteristic (AUROC) curve was calculated to quantify the diagnostic accuracy of the CCI to predict mortality risk. [Results]: A total of 3120 patients with IE (1327 < 65 years;1291 65-80 years;502 ≥ 80 years) were enrolled.Fever and heart failure were the most common presentations of IE, with no differences among age groups.Patients ≥80 years who underwent surgery were significantly lower compared with other age groups (14.3%,65 years; 20.5%,65-79 years; 31.3%,≥80 years). In-hospital mortality was lower in the <65-year group (20.3%,<65 years;30.1%,65-79 years;34.7%,≥80 years;p < 0.001) as well as 1-year mortality (3.2%, <65 years; 5.5%, 65-80 years;7.6%,≥80 years; p = 0.003).Independent predictors of mortality were age ≥ 80 years (hazard ratio [HR]:2.78;95% confidence interval [CI]:2.32–3.34), CCI ≥ 3 (HR:1.62; 95% CI:1.39–1.88),and non-performed surgery (HR:1.64;95% CI:11.16–1.58).When the three age groups were compared,the AUROC curve for CCI was significantly larger for patients aged <65 years(p < 0.001) for both in-hospital and 1-year mortality. [Conclusion]: There were no differences in the clinical presentation of IE between the groups. Age ≥ 80 years, high comorbidity (measured by CCI),and non-performance of surgery were independent predictors of mortality in patients with IE.CCI could help to identify those patients with IE and surgical indication who present a lower risk of in-hospital and 1-year mortality after surgery, especially in the <65-year group

    CIBERER : Spanish national network for research on rare diseases: A highly productive collaborative initiative

    Get PDF
    Altres ajuts: Instituto de Salud Carlos III (ISCIII); Ministerio de Ciencia e Innovación.CIBER (Center for Biomedical Network Research; Centro de Investigación Biomédica En Red) is a public national consortium created in 2006 under the umbrella of the Spanish National Institute of Health Carlos III (ISCIII). This innovative research structure comprises 11 different specific areas dedicated to the main public health priorities in the National Health System. CIBERER, the thematic area of CIBER focused on rare diseases (RDs) currently consists of 75 research groups belonging to universities, research centers, and hospitals of the entire country. CIBERER's mission is to be a center prioritizing and favoring collaboration and cooperation between biomedical and clinical research groups, with special emphasis on the aspects of genetic, molecular, biochemical, and cellular research of RDs. This research is the basis for providing new tools for the diagnosis and therapy of low-prevalence diseases, in line with the International Rare Diseases Research Consortium (IRDiRC) objectives, thus favoring translational research between the scientific environment of the laboratory and the clinical setting of health centers. In this article, we intend to review CIBERER's 15-year journey and summarize the main results obtained in terms of internationalization, scientific production, contributions toward the discovery of new therapies and novel genes associated to diseases, cooperation with patients' associations and many other topics related to RD research

    Tratamientos psicológicos empíricamente apoyados para adultos: Una revisión selectiva

    Get PDF
    Antecedentes: los tratamientos psicológicos han mostrado su eficacia, efectividad y eficiencia para el abordaje de los trastornos mentales; no obstante, considerando el conocimiento científico generado en los últimos años, no se dispone de trabajos de actualización en español sobre cuáles son los tratamientos psicológicos con respaldo empírico. El objetivo fue realizar una revisión selectiva de los principales tratamientos psicológicos empíricamente apoyados para el abordaje de trastornos mentales en personas adultas. Método: se recogen niveles de evidencia y grados de recomendación en función de los criterios propuestos por el Sistema Nacional de Salud de España (en las Guías de Práctica Clínica) para diferentes trastornos psicológicos. Resultados: los resultados sugieren que los tratamientos psicológicos disponen de apoyo empírico para el abordaje de un amplio elenco de trastornos psicológicos. El grado de apoyo empírico oscila de bajo a alto en función del trastorno psicológico analizado. La revisión sugiere que ciertos campos de intervención necesitan una mayor investigación. Conclusiones: a partir de esta revisión selectiva, los profesionales de la psicología podrán disponer de información rigurosa y actualizada que les permita tomar decisiones informadas a la hora de implementar aquellos procedimientos psicoterapéuticos empíricamente fundamentados en función de las características de las personas que demandan ayuda. Background: Psychological treatments have shown their efficacy, effectiveness, and efficiency in dealing with mental disorders. However, considering the scientific knowledge generated in recent years, in the Spanish context, there are no updating studies about empirically supported psychological treatments. The main goal was to carry out a selective review of the main empirically supported psychological treatments for mental disorders in adults. Method: Levels of evidence and degrees of recommendation were collected based on the criteria proposed by the Spanish National Health System (Clinical Practice Guidelines) for different psychological disorders. Results: The results indicate that psychological treatments have empirical support for the approach to a wide range of psychological disorders. These levels of empirical evidence gathered range from low to high depending on the psychological disorder analysed. The review indicates the existence of certain fields of intervention that need further investigation. Conclusions: Based on this selective review, psychology professionals will be able to have rigorous, up-to-date information that allows them to make informed decisions when implementing empirically based psychotherapeutic procedures based on the characteristics of the people who require help
    corecore